RESEARCH PAPER 


ESET: [EAGLES STUDENT EVALUATION OF TEACHING] - AN ONLINE 
ANDRAGOGICAL STUDENT RATINGS OF INSTRUCTION TOOL THAT 
IS AN IN-DEPTH SYSTEMIC STATISTICAL MECHANISM DESIGNED TO 
INFORM, ENHANCE, AND EMPOWER HIGHER EDUCATION 


By 

JAMES EDWARD OSLER II * MAHMUD A. MANSARAY ** 

* Faculty Member, North Carolina Central University USA. 

** Research Analyst, North Carolina Central University, USA. 

ABSTRACT 

This paper seeks to provide an epistemological rationale for the establishment of ESET (an acronym for: "Eagles Student 
Evaluation of Teaching"] as a novel universal SRI [Student Ratings of Instruction] tool. Colleges and Universities in the 
United States use Student Ratings of Instruction [SRI] for course evaluation purposes (Osier and Mansaray, 2013a). This 
research investigation is the third part of a post hoc study that psychometrically examines the reliability and validity of the 
items used in an Historic Black College and University (HBCU) SRI instrument. The ESET sample under analysis consisted of 
the responses to 56,451 total items extracted from 7,919 distributed Student Ratings Instruments that were delivered 
electronically (at HBCU) to students who completed the ESET tool. The ESET methodology provides a statistically valid 
SRI/SET survey instrument along with a variety of post hoc statistical measures to determine the efficacy of collegiate 
instruction. This research is also the continuation of research conducted on innovative statistical metrics introduced in 
the i-managre's Journal on Mathematics. 
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INTRODUCTION 

With the increased use of performance-based allocation 
(O'Shaughnessy, 2013) by many state legislatures as a 
benchmark of student learning in funding many United 
States Public Universities, the burden to promote quality 
learning and effective teaching in public universities 
increasingly falls within the ambit of the university 
administrators (Chancellors, Provosts, Deans, and Chairs). 
Simultaneously, the demands to satisfy the students' 
longing for superior grades (Agbetsiafa, 2010), is 
influencing college administrators to progressively make 
use of diverse methods of evaluating the qualify of 
teaching, to formulate decisions on college programs, 
retention and graduation rates, student learning 
outcomes, and faculty employment, among others. In 
truth, strategic plans, like faculty-student advising; for 
example, that would enhance student success are 


applicable in some universities, and others are constantly 
exploring ways that would further improve effective 
teaching and student success. Indeed, an assessment tool 
recurrently used by college administrators in state 
universities to assess teaching effectiveness and student 
learning for personnel policy decision-making, including 
other summative purposes (tenure, merit increase, 
retention for non-tenured), is the "Student Evaluations of 
Teaching" (SET) (Agbetsiafa, 2010; Safavi, Bakar, Tarmizi, & 
Alwi, 2012; Seyedeh & Kamariah, 2013; Taylor, Grey, & 
Satterthwaite, 2013). Agbetsiafa (2010) noted that, the SETs 
are ever more becoming noteworthy in summative and 
formative procedures in the Universities because they 
present a strategic, methodical, and valued means of 
obtaining feedback on students' responses to instructors 
and courses. It is noteworthy that, administrators in state 
Universities are progressively making inferences on SET to 
compose personnel decisions regarding curriculum 
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improvement, teaching effectiveness and student learning 
outcomes (Agbetsiafa, 2010; Donnon, Delver, & Beran, 
2010; Kozub, 2010; Madden, Dillon, & Leak, 2010), 
Arguably, even though several academics may conclude 
that SETs are dependable tools, there is a gap between the 
researchers concerning the trustworthiness of the 
evaluation design for faculty assessment. There is obviously 
less unity among the academics about their general 
validity (soundness) and reliability (trustworthiness) with 
respect to the level at which the evaluation designs fittingly 
assess tangible terms (e.g., teaching qualify), or present a 
complete rating of the course or educator (Agbetsiafa, 
2010; Beran & Rokosh, 2009; Clayson, 2009; Osier & 
Mansaray, 2013b), In effect, while some researchers 
maintained, there was hardly any proof of an association 
between student ratings and teaching effectiveness 
(Madden etal., 2010; Otani, Kim, & Jeong-IL, 2012), others 
regarded the ratings to be a significant evaluation of 
teaching efficiency and student learning (Frick, Chadna, 
Watson, Wing, & Green, 2009; Zhao & Gallant, 2012). 
Flowever, due to recurrent utilization of student ratings in 
university administrative policy resolutions, and the 
inconsistency of the academics to reach an agreement 
on the validity and reliability of the current design to assess 
faculty and teaching qualify, it is crucial to endorse the 
extension of the model to include other measurements of 
evaluating teaching qualify and student learning. Thus, the 
objective of this paper is to promote the expansion of the 
student ratings of instruction to include the ESET, to 
effectively and efficiently evaluate teaching qualify (and its 
antecedent: teaching effectiveness) and thereby provide 
a new mechanism that will be contributing to the metrics 
use of teaching evaluation by the collegiate administrative 
sector. 

1. Key Terminology 

Validity: The validity of an assessment design is the degree 
to which the device measures what it is proposed to 
mention (Agbetsiafa, 2010; Chen & Watkins, 2010; Leedy & 
Ormrod, 2009; Zhao & Gallant, 2012) 

Reliability: The reliability of an assessment design is the 
consistency with which the evaluation instrument 
generates outcome when the entity that is being 


evaluated stays unchanged (Agbetsiafa, 2010; Chen & 
Watkins, 2010; Leedy & Ormrod, 2009; Zhao & Gallant, 
2012 ). 

Formative Assessment: Student ratings are used by many 
Universities to assess faculty at the end of every semester for 
teaching effects which may be used in curriculum 
improvement, teaching effectiveness and student learning 
outcomes (Agbetsiafa, 2010; Donnon, Delver, & Beran, 
2010; Kozub, 2010; Madden etal., 2010). 

Summative Assessment: Student ratings are used by many 
Universities to assess faculty at the end of every semester for 
teaching effects which may be used for faculty promotion, 
tenure and faculty hiring (summative) (Agbetsiafa, 2010; 
Safavi etal., 2012; Seyedeh & Kamariah, 2013; Taylor etal., 
2013). 

2. Operational Definition of ESET Variables 

The operational variables in the ESET research include the 
following independent and dependent (construct) 
variables: 

Independent (Internal Construct) Variables: The 
independent variables for this study were the questions on 
the survey instruments envisioned to assess faculty on 
course programs and teaching effectiveness contained 
within the ESET instrumentation infrastructure. The questions 
included in ESET were on: course design, instructional 
method of the teacher, examinations and feedbacks, 
instructor's enthusiasm and communication, among 
others. Indeed, Osier and Mansaray (2013b) applied 
reading scripts and instructor's passion, among others, as 
independent variables in their analysis of teaching 
effectiveness and student learning. 

Dependent (External Construct) Variables: The dependent 
variables measured the effect of the independent 
variables. The dependent variable is teaching qualify. The 
global overall teaching effectiveness of the instruments is 
assumed to indicate teaching effectiveness and gives the 
perception of student learning, Agbetsiafa (2010) applied 
effective teaching as the dependent variable in his 
construct validity of teaching effectiveness and student 
learning. 
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3. Literature Review 

The term "andragogy" describes a theoretical framework in 
adult learning conceptualized by Knowles (1974) to isolate 
adult learning from conventional pedagogy. Indeed, the 
pedagogical approach is grounded on a few 
assumptions, including the proposition that since students 
hardly have adequate rudimentary knowledge; they rely 
on the teacher for bearing concerning their education 
requirements (Forrest & Peterson, 2006). Knowles (1980) 
described the andragogy as the art and science of 
assisting adults in learning, which he compared to 
pedagogy as the art and science of assisting children in 
learning. Knowles countered his theory against the 
backdrop that children and adult learn inversely and on his 
reflection, they appear to retort in a different way to their 
teachers. 

The five assumptions underlying Knowles' (1980) 
andragogy define the adult student as: 

1. An individual that has an independent, self-image, 
and that guides his individual education. 

2. An individual that has accrued a basin of lifetime 
experiences, which essentially offer a rich foundation 
for learning. 

3. An individual that has learning requirements thoroughly 
associated to varying social parts. 

4. A problem-centered individual who is engrossed in the 
instantaneous presentation of knowledge. 

5. An individual enthused to study by interior rather than 
exterior factors. 

Based on these assumptions, Knowles (1980) postulated a 
program-development model for creating, applying, and 
assessing educational experiences with adults. The 
implication of the first assumption, for example, is that as 
adults developed, they become further self-regulating and 
self-directing. Concerning this assumption, Knowles (1980) 
noted the classroom environment ought to be adult as 
such both substantial and expressively. He further said in the 
mature classroom, adults feel recognized, appreciated, 
and sustained, and there is an existing atmosphere of 
affinity between instructors and students as combined 
questers. He surmised that these assumptions form the 


foundation of adult learning. Forrest and Peterson (2006) 
who noted that, contrary to children, adults learn from their 
massive collection of lifetime experiences, which 
reinforces the capability to self-determination on their 
deficiency and, therefore, requires absorbing, also support 
this notion. In addition, the adult students are likewise 
possible to want a superior feel of collaboration between 
the student and the instructor as they advance through the 
Educational procedure (Zmeyov, 1998). 

Knowles' (1980) andragogy developed into one of the 
greatest contentious and argued concepts in the arena of 
adult education (Brookfield, 1986; Davenport & Davenport, 
1985; Hartree, 1984). Arguably, many scholars find it 
difficult to accept andragogy as a theory. Indeed, in one 
of these debates, Davenport & Davenport (1985) noted 
that, andragogy was categorized, on different occasions, 
as a procedure of adult education; a theory of adult 
learning; a method of adult education; and, a theory of 
adult education, among others. Hartree (1984) even 
questioned that, if there was a theory of any kind, signifying 
that perhaps the assumptions were merely philosophies of 
virtuous exercises, or explanations of what the adult learner 
ought to be. Additionally, a region of ongoing 
disagreement is the level at which the assumptions are 
features of adult learners alone. Certainly, though it may be 
factual that, several adults are autonomous learners, 
certain adults, arguably, are exceedingly reliant on an 
instructor for configuration. Alternatively, certain children 
are autonomous, self-directed learners. Additionally, even 
when reflecting on, the more apparent assumption that 
adults have added and subterranean lifespan 
experiences, this possibly will not serve promising in a 
learning condition. Definitely, certain life experiences can 
serve as obstacles to knowledge (Merriam, Mott, & Lee, 
1996). In addition, children in assuring circumstances may 
have an array of experiences qualitatively better-off than 
some grown-up. 

The fluidity of the contentious arguments among the 
scholars that the andragogy assumptions were not 
essentially accurate of all adults spurred Knowles (1980) to 
revise his individual intelligence regarding whether 
andragogy was only for adults and pedagogy only for 
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children. This induced him to move away from an 
andragogy as opposed to pedagogy locus to 
demonstrating the two on a continuity extending from 
teacher-directed to student-directed learning. Knowles 
recognized that both methodologies are opposite with 
children and adults, contingent on the circumstances. 
Certainly, an adult who discerns little or naught about a 
subject, for example, will be heavily reliant on the instructor 
for knowledge path (Knowles, 1984). At the same time, 
Knowles also recognized that children who are logically 
enquiring and are actually self-directing in their scholarship 
external of school, among others, possibly will also be more 
self-directed in school. Knowles reviewed assumptions 
have resulted in a new definition of andragogy, which relied 
more on the learning condition than on the learner. Today, 
the application of andragogy in teaching and learning 
may depend on whether the academic focuses on the 
teacher-directed or student-directed aspect of learning. 

Indeed, in spite of the continued contention about the 
theoretical connotation of andragogy, it is widely prevalent 
among the academics and researchers worldwide, and its 
research group is increasingly mounting (Amrein-Beardsley 
& Haladyna, 2012; Gilstrap, 2013; Savicevic, 1991; Young, 
2012 and Gilstrap, 2013, for example, applying a 
quantitative technique, produced a synopsis of the 
andragogy theory in relation to teaching philosophies 
among librarians ACRL members, using Hadley's 
Educational Orientation Questionnaire, Based on the 
theoretical framework, the result found nonlinear and 
negative correlation between librarians with an 
understanding of the ACRL Standards and their adult 
learning orientation ratings, p = ,047, t< .05. The results also 
highlighted the significance of adult learning as well as the 
assessment of teaching philosophies. Similarly, Forrester 
and Peterson (2006) also applied andragogy in 
management, The authors noted the anagogical 
approach was essential in Management Education, to 
help and prepare students for their working environment. 
Furthermore, the authors surmised that current 
management desires the application of skills learned, and 
not rule of ethics. Therefore, without application, the 
student cannot acclimatize to the evolving place of work. 
Amrein-Beardsle & Haladyna (2012), meanwhile, 


referenced the andragogy theory of adult learning to 
generate and validate a survey to assess teaching 
effectiveness. The authors also noted that, an assessment 
survey based on a theory that defines an effective teacher 
upsurges the likelihood for validation, and that bringing into 
line the survey items with a theoretical based explanation 
of effective teaching that decreases the quantity of total 
items required. Consequently, there is less chance that 
halo rating errors will decrease subscore validity. 

4. Purpose of the Study 

For the purpose of this research, the applicability of 
andragogy is noteworthy in determining the validity and 
reliability of both Student Evaluations of Teaching (SET) and 
Graduating Senior Survey (GSS) combined as a 
measurement of teaching effectiveness and student 
learning. First, the theory takes into account that the 
participating students in both survey designs are adult 
learners, and that learning takes place within the 
classroom. Depending on the situation, the classroom 
scene may operate as either teacher-directed or student- 
directed learning. In addition, the first assumption in the 
andragogy model has to do with adults need to know why 
they are learning before they could participate in the 
learning process (Knowles, 1984), This assumption 
correlates with a few individual items on both SET and GSS 
survey designs where students are asked about their 
objects in learning as they relate to their learning 
outcomes. In addition, the SET design takes into account 
the professional relevance of the course to the degree 
program (Amrein-Beardsley & Haladyna, 2012) in relation 
to the instructional effectiveness. This aspect is also 
connected with the first assumption of the andragogy 
model. This notwithstanding, both the SET and GSS designs 
also require students to self-rate themselves on a few survey 
items concerning their learning efforts and motivation. 
Such questions have to do with the fourth and fifth 
assumptions of the models. In sum, the andragogy model 
is a proper theoretical fit for this research. 

5. Background of ESET Research 
5 .1 Existing SET Research 

The existing gap between the researchers in connection 
with the validity and reliability of the SET instrument as a 
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measurement of teaching effectiveness means, SET alone 
cannot guarantee the truthfulness of instrument as a 
measure of teaching effectiveness, This has encouraged 
some researchers to endorse other methods of assessing 
teaching effectiveness (El Hassan, 2012; Marsh et at, 

2011) . Even so, most of the available literature on the 
measurement of teaching effectiveness is on SET because 
of its popularity in many universities, in spite of its 
questionable reliability by some academics. It is against 
this background that the literature review of this research will 
include a discussion of SET. Additionally, while there is hardly 
any information on the GSS survey as a measurement of 
teaching effectiveness, SETs are widely applicable in many 
Universities worldwide as a proxy for teaching effectiveness 
and by implication, the student perception of learning 
(Carrell & West, 2010; Hatfield & Coyle, 2013; Madden et 
at, 2010; Spooren, Brockx, & Mortelmans, 2013). In reality, 
there is a growing list of literature on the utility, validity and 
reliability of the student evaluations of faculty. In reality, 
several Journals on the evaluation of university faculty on 
teaching effectiveness and student success make use of 
the Student Evaluations of Teaching (SET) design (Clayson, 
2009; Donnon et al., 2010; Otani et at, 2012; Pritchard & 
Potter, 2011; Ruppert & Green, 2012; Zhao & Gallant, 

2012 ) . 

5.2 The Utility oftheESETDesign 

An exercise that is predominant in several universities and 
colleges in the United States and elsewhere, is the utilization 
of SET to assess instructional effectiveness and student 
success (Donnon et al., 2010; Spooren, Mortelmans, & 
Denekens, 2007; Stowell, Addison, & Smith, 2012). This 
notion is supplemented by the agreement among some 
scholars who noted SETs appear sufficiently useful and 
effective in evaluating what they want to define: teaching 
efficiency, student learning satisfaction, educational 
knowledge, and program curriculum (Agbetsiafa, 2010; 
Beran, Violato, & Kline, 2007; Skowronek, Friesen, & 
Masonjones, 2011; Zhao & Gallant, 2012). Obviously, the 
use of student evaluations of faculty in Colleges and 
Universities is to deliver a helpful statement to faculty for 
teaching improvement, as well as a superficial assessment 
of teaching efficiency for personnel or administrative 


decisions, in addition to supplying information to students 
for the choosing of courses and instructors (Marsh and 
Roche, as cited in Zhao & Gallant, 2012; Beran, Violato, 
Kline, & Frideres, 2009). Additionally, the application of SETs 
as a measurement of teaching effectiveness is significant; 
because, they offer a planned, systematic, and effective 
medium of receiving feedback on students' reactions to 
teachers and courses (Agbetsiafa, 2010), and have been 
about ever since the middle of 1920s (Cohen, as cited in 
Donnon et al., 2010; Apollonia & Abrami, as cited in Safavi 
et al., 2012). This notwithstanding, faculty also applies SETs 
to acquire students' responses concerning their courses 
and record development in their instruction parts and 
responsibilities, which is an important influence in their 
occupations. Beran and Rokosh (2009), for example, 
informed from a survey of 262 University teachers that 84% 
of the respondents supported the application of SET overall, 
and 62% of the respondents felt that departmental heads 
and deans properly applied SET results. However, the 
method in which instructors apply SET differs about 
background and experience (Sprooren et at, 2007). To this 
end, Arthur (2009) argued that replying to feedback was a 
multifaceted procedure, and as a result, he established a 
typology of factors (e.g. Personality, student characteristics, 
teaching and learning strategies) that impacted teachers' 
individual responses to undesirable feedbacks (i.e. blame, 
shame, etc.). Meanwhile, Aleamoni (as cited in Zhao & 
Gallant, 2012) had also suggested the application of 
student ratings because students can suggest information 
on the accomplishment of essential Educational 
objectives; empathy with the instructor; and, rudiments of a 
classroom, such as instructional provisions, assignment, 
and instructional processes. 

Additionally, the student ratings are also applied to express 
understanding to the students and to institute 
administrative resolves, such as offering life-term tenure 
and advancement, evaluation of curriculum programs, 
faculty hiring, and improvement in teaching performances 
(Beran et at, 2007; Beran et at, 2009; Kozub, 2010; 
Seyedeh, Kamariah, Rohani, & Alwi, 2012; Seyedeh & 
Kamariah, 2013). Furthermore, some studies revealed that 
students have the tendency to consider teaching 
assessments sincerely, and are eager to contribute and 
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offer expressive answer when they contemplate and 
comprehend that, their contributions are replicated and 
integrated by their instructors and the University (Agbetsiafa, 
2010). Beran et al. (2007) also noted that by helping 
instructors to increase their instruction through rating 
feedback, administrators may be able to supervise 
definitive course developments. Such information may be 
important to the administrators who can apply it to 
supervise and improve teaching efficacy largely. The 
integration of such information is also likely; to regulate 
teaching qualify in a department or program concerning 
other programs (Beran et al., 2007). The authors further 
added procedural resolutions may also be knowledgeable 
by ratings, that may define the instructors' course 
assignments in succeeding terms. Spooren et al. (2013), 
argued that, the appropriate gathering an interpretation of 
SET data depends upon administrators having thorough 
methodological preparation and systematic briefing on 
the major outcomes and trends in the study field. 

Nevertheless, in spite of its widespread utility, the students' 
perceptions concerning the utility of SET ratings were 
lacking, ambivalent, and not properly understood (Beran et 
al., 2009; Spooren etal., 2013). Spooren etal. (2013) further 
said that, the students' ambivalent of the utility of SET was 
the comprehension in their assessments may not be taken 
earnestly either by the faculty or administration for 
enhancing teaching quality. In addition, many faculty 
members also remained apprehensive about SETs' 
summative application in personnel decisions, such as 
faculty retention, promotion, salary increases, tenure 
(Beran & Rokosh, 2009; Beran et al., 2007). This 
apprehension had a connection with the absence of 
knowledge about the efficacy of the ratings data, or the 
unease that administrators were exploiting the ratings data 
in personnel decisions (Beran & Rokosh, 2009). This 
notwithstanding, the authors further added that SET ratings 
also had imperfect usage in refining detailed aspects of 
instruction, such as choosing course resources and 
organizing assignments and exams. 

5.3 Psychometric Properties: ESET as a SET Model—Items 
and Measurement 

There are several existing tools, both online and direct, 


which are applicable for assessing faculty on teaching 
effectiveness and the perception of student learning. 
Several questions on SETs aim to measure the instructor on 
teaching effectiveness and program outcomes 
(Anastasiadou, 2011; Baker, Pollio, & Hudson, 2011; 
Chulkov & Van Alstine, 2012; Fitzpatrick & Miller-Stevens, 
2009; Keeley, Furr, & Buskist, 2010; Skowronek et al., 2011; 
Spooren, 2010; Zhao & Gallant, 2012). The SET designs are 
independent student survey instruments utilized to assess 
teaching excellence and student learning outcomes. 
Examples include the British Noel-Levitz Student Satisfactory 
Inventory, the Course Perception Questionnaire, the 
Student Evaluations of Educational Quality, the CoursEval, 
the Course Experience Questionnaire, the Student 
Instructional Report, and the Student Teaching Evaluation 
Instruments, among others (Liu, 2012; Young & Duncan, 
2014). Indeed, even with the dissimilar labels, the majority 
of the instruments have comparable characteristics and 
individualized learning items, which were summed to 
generate a teaching effective, score (Skowronek et al., 
2011). SETs usually contain a number of Likert-scale 
(between 4-points to 10-points scales) based questions 
that request students to assess several aspects of the 
instructor's teaching and course design. The questions are 
placed on Likert-scales. These forms or surveys are finalized 
by the students at the end of every semester and frequently 
function as a summative measure in administrative 
decisions about faculty tenure promotion, and merit pay 
(Johnson, Narayanan, & Sawaya, 2013; Mau & Opengart, 
2012; Venette, Sellnow, & McIntyre, 2010), and as a 
formative measure for enhancing teaching abilities and 
course design (Donnon et al., 2010;Dorasamy & Balkaran, 
2013; Osier & Mansaray, 2013). 

SETs usually contain Likert-scale based questions on 
teaching effectiveness. The Universal Student Ratings 
Instrument, for example, was a SET design introduced at a 
Canadian Graduate University, and had 12 items on a Likert 
scale independent of the others and applied courses of 
comparable kind and scope as a basis for assessment. The 
authors further said 11 of the 12 items were designed to 
produce definitive ratings on module of the course, using a 
7-point scale from 1 = strongly disagree to 7 = strongly 
agree. The 12 th item on the design was an inclusive global 
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rating of the quality of the course instruction computed on 
a different 7-point scale, which ranged from 1 = 
unacceptable to 7 = excellent. Osier and Mansaray (2013) 
also noted that, the applicability of the Cours Eval design, 
to assess faculty at a University in the Southern United States. 
This was a rating design on a 5-point Likert scale, 
administered online every semester, and requires students 
to assess their instructors on 12 items, which together 
summed into teaching effectiveness. The questions on the 
rating scale include: 

1. The identified goals and objectives for this course are in 
accordance with what was actually taught. 

2. The subject material of this course is soundly organized. 

3. The instructor plainly delivers his/her subject substance. 

4. The instructor is passionate and stimulates interest in his 
course. 

5. My supremacy to reason, censure, and/or construct 
have been enriched as a result this course. 

6. The texts and other readings allocated for this course 
were supportive. 

7. The instructor applies instructional methods (for 
example, discussions, lectures, audio, visuals, field 
work, demonstrations, computer programs, etc.), 
which effectively improve learning in this course, 

8. The examinations are in accordance with the course 
objectives and the instruction. 

9. Quizzes, examinations and/or written assignments are 
delivered often enough to help me assess my growth. 

10. The instructor is sincerely concerned with students' 
advancement. 

11.1 am able to acquire assistance from the instructor 
when I require it, 

12. This instructor is effective in endorsing learning. 

Indeed, the 5-point Likert scale on each item ranged from 
1 = strongly disagree to 5 = strongly disagree. The design 
further has three sub-items on student efforts on the course 
whose values ranged from 1 = never to 5 = all of the time. 
In reality, many of the research journals assume the Likert 
scale on SETs as an interval scale (Chulkov & Van Alstine, 
2012; Zhao & Gallant, 2012), thus making it possible to 


apply quantitative techniques to examine the reliability 
and validity of the instruments, 

Even with the widespread use of the SET design, it is argued 
that the majority of them have a single-item methodology 
of faculty measurement, and measuring instructional skills 
on a single-item methodology generates a more 
confusing interpretation of the given responses (Spooren et 
at, 2007). In addition, there is still the disagreement among 
the academics about the reliability and validity of SET as a 
measurement of teaching effectiveness (Anastasiadou, 
2011; Galbraith & Merrill, 2012; Galbraith, Merrill, & Kline, 
2012;Skowroneketal., 2011; Zhao&Gallant, 2012), 

5.4 The Reliability of ESET and SET 

As the reliance on student ratings has augmented over 
time, so has the number of research studies on the 
psychometric properties of ratings, particularly reliability 
and validity (Beran & Rokosh, 2009). In truth, the reliability of 
SET refers to the consistency of ratings among distinct raters 
and the steadiness of such ratings over time. In other words, 
reliability is concerned with the internal consistency, stability 
and dependability of the design applied to assess 
teaching effectiveness. Therefore, given the realization 
that, the SET design is extensively used in several Universities, 
it is noteworthy that, the data resulting from these 
instruments function as a reliable measure of teaching 
superiority and course improvement (Agbetsiafa, 2010). 
McMillan (as cited in Zhao & Gallant, 2012) also noted a 
dependable result is one that has equivalent performance 
at different times. Notwithstanding the enduring 
controversy surrounding the reliability of SETs, several studies 
realized that student evaluations of teaching are reliable, 
stable across items, raters, and period. Certainly, as 
reliability is a principal foundation of validity, having superior 
reliability is critical for summative and formative evaluations 
(Haladyna &Amrein-Beardsley, 2009). 

Several different statistical models are applicable in the 
determination of the reliability of the rating tools utilized to 
assess faculty. Most of these assessments focused on the 
internal consistency (stability) of the ratings. The most 
established is the Cronbach's alpha statistics, with the 
alpha varying from 0 to 1; the 1 being the maximum 
reliability score. In their study, Dorasamy and Balkaran 
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(2013), for example, assessed student ratings of teaching 
aptitudes for use in program evaluation, making use of a 
sample of 3,060 within the Faculty of Management 
Sciences at the Durban University of Technology. Using 
teachers' questionnaires on a Likert scale, the authors 
completed a reliability test by directing several 
measurements on the same subject, and a general 
reliability score of 0.949 was achieved, indicating a high 
degree of internal consistency of the responses. Beran et al. 
(2009), for example, conducted a study to determine what 
students find valuable in student ratings. With the utilization 
of survey responses from (n= 1229) students at a prominent 
Canadian University, the authors established a 
psychometrically extensive measurement of the utility of 
student ratings. Using the Cronbach's Alpha model, the 
results confirmed the internal consistency reliability of the 
16 items on the rating scale at 0.93, thus signifying a 
superior level of internal consistence for the SET ratings. 
Donnon et al. (2010) correspondingly realized a superior 
level of internal consistency with a Cronbach's alpha 
coefficient of 0.93 in their research on SET in medical 
sciences graduate programs. Osier and Mansaray (2013) 
also noted a Cronbach's Alpha of 0.954 for the 15 items on 
the SET scale for their study on the validity and reliability of 
independent instructional measures in a Southern state 
University, confirming the high internal consistency of the 
measuring instrument. Agbetsiafa (2010) also had a high 
level of internal consistency of the student answers to the 
rating items in his study. Anastasiadou (2011) similarly had a 
Cronbach's alpha of 0.908, which was over the threshold, 
endorsing a strong internal consistency of the designing 
instrument. Still, there are disagreements among scholars 
concerning the validity and reliability of SET. Galbraith et al. 
(2012), for example, found little or no support for the 
soundness of SET as an overall gauge of teaching 
effectiveness or student learning. 

5.5 The Validity of ESET and SET Research 

There are several Journals on the validity of the SET design. 
However, the validity study has always been a contentious 
subject in SET, and the problem is still unresolved among 
the academics. Certainly, in all-purpose, validity of SETs 
indicates the level at which the student ratings in effect 


assess what they are intended to evaluate (Zhao & Gallant, 
2012). There are several types of validity studies, including 
content validity, construct validity, external validity, and 
criterion-related validity, among others. However, there is 
an integrated approach to validity and validation was 
recent (Kane, as cited in Haladyna & Amrein-Beardsley, 
2009). Thus, in the novel cohesive view, validity refers to the 
truth of an explanation of a score, for instance, the results 
from student evaluations of teaching (Haladyna & Amrein- 
Beardsley, 2009). Moreover, the authors noted that, the 
validity may also relate to the application of student ratings, 
citing their research whose objectives were to assist the 
instructor to improve the methods of instruction. Then 
again, the validity of this information for formative uses was 
questionable. 

Indeed, in spite of the recent unified validity view, some 
scholars remain to be interested in a narrow focus of 
validity, particularly construct validity, in connection with SET 
as a measurement of teaching effectiveness. Surely, 
Agbetsiafa (2010) had argued for the use of construct 
validity by noting that to apply student ratings to evaluate 
teaching effectiveness and student learning outcomes, 
then the instruments must be visible to inspiring validity trials 
and examination. Cronbach and Meehl (as cited in Zhao, 
2012) who had said construct validity was the degree to 
which an apparent measurement mirrored the central 
theoretical construct that the academic had planned to 
assess initially postulated Agbetsiafa's argument. 
Skowronek et al. (2011) also said it was crucial to discuss 
concerns that were associated with construct validity, 
including rejoining whether the nature of the student rating 
technique was rational for the construct that was being 
assessed. 

In reality, there are several research studies with applicable 
distinct statistical models, including factorial analysis, 
stepwise linear regression, multivariate analysis of variance, 
and structural equation models, among others, to validate 
the SET tool. Agbetsiafa (2010), for example, utilized the 
factorial analysis to examine the construct validity of the SET 
in determining the association between teaching effective 
and student ratings in a university level course, in 
Economics at the University of Indiana. Using (N = 1300) 
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sampled students, the result of the Kaiser-Meyer-Olkin 
(KMO) statistics on the rating scale was 0,912, signifying the 
suitability for the use of factor analysis to the data, In 
addition, the Bartlett's for the presence of interaction 
amongst the variables was significant at p< 0,0001, In 
sum, the results found positive associations between 
student perception of teaching effectiveness, education 
support, effective communication, and clarity of course 
works, and course assessment and feedback, therefore 
confirming the construct validity of the rating tool. Similarly, 
Zhao and Gallant (2012) applied the confirmatory factor 
analysis via the structural equation model to examine the 
validity of the SET design in a Midwestern university in the 
United States for both administrative and instructional 
decisions. Using (N = 73,500) sampled students who 
completed the assessment, the results revealed the model 
was an acceptable fit for the data, and that instruction 
effectiveness was appropriately and satisfactorily 
evaluated by the 10 observed variables in the SET survey. 
That notwithstanding, using the principal component factor 
analysis with varimax rotation, Osier and Mansaray (2013) 
realized a significant relationship amongst the 15 items on 
the rating scale of p< .000. Furthermore, the factor loading 
for the 15 items all have loading of >= 0.82, thus 
establishing the construct validity of the ESET. 

El Hassan (2009) on the other hand, was particular about 
substantive and consequential validity of the student 
ratings, His research discussed concerns of substantive and 
consequential validity, and upheld these could be 
effusively discussed where the assessment techniques 
were completely planned and realized, including effective 
communique to students and faculty concerning the 
tenacity of the evaluation techniques. Exploiting a 
descriptive statistics on a 5-point Likert scale, the author 
noted that, about 70% of the students acknowledged the 
student ratings to be the standard for demonstrating 
suggestions for improvement, and about 50% of students 
also said faculty values their input to generate teaching 
development. The study also found several instructors value 
which is the contribution from the ratings and applied them 
for course progression. Stowell, Addison, and Smith (2012), 
meanwhile, examined whether there was a variance in the 
response rate and validity between online and classroom- 


based student evaluations of faculty. Using (N = 2057) 
sampled students who responded to the SET survey, the 
authors utilized the t-statistics and correlation matrices, and 
realized no significant differences in the mean ratings 
between the online format and classroom-based student 
ratings, thus establishing that the two assessment formats 
generated comparable data and validity results. 

5.6 Alternative Arguments Regarding ESET and SET 

Even with the extensive application of SET as a 
measurement of effective teaching in many colleges and 
universities, it continues to be fraught with controversies and 
biases, which appeared to challenge its validity on several 
fronts. Arguably, student ratings of faculty differ in 
accordance with several student characteristics. Indeed, 
students who anticipated a superior letter grade in the 
course appeared to offer high ratings of instruction, and an 
instructor who embraced a more compassionate grading 
standard when applying subjective testing techniques may 
receive higher student assessments of teaching 
performance. A supplement to this bias was the findings of 
Slocombe, Miller, and Hite (2011) who noted students 
inclined to offer higher evaluations to instructors who 
applied humor and to instructors they adored. The authors 
also noted students failed to offer higher evaluations to 
male instructors or those below 55 years. In addition, 
Galbraith et al. (2012), in their study of effective teaching, 
found minimal or no backing for the validity of SET as an 
overall pointer of teaching effectiveness or student 
learning. Other studies also noted the influence of gender 
and race on student ratings of faculty, while Ibrahim (2011) 
said that class size had a positive influence on the 
dependability of student evaluations of instruction, and 
that ratings received from bigger classes were more 
dependable than ratings of reduced classes. 

In sum, the ongoing enquiry surrounding the overall validity 
of SET alone as a measurement of teaching effectiveness 
evidently underscores the necessity to expand SET to 
include other measures of teaching efficiency, particularly 
for personnel decisions (El Hassan, 2012). To this end, the 
significance of the addition of the Graduating Senior Survey 
design (GSS) to the SET design as a combined 
measurement of teaching effectiveness and student 
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learning for personnel decisions may be crucial for added 
reliability and validity. This is because granting the GSS is 
similar to the initial SET ratings on the same scale, diverse 
scale items concerning the quality of instruction, courses, 
curriculum admittances, and other subject can offer fresh 
information (Berk, 2005). 

6. Statement of the Problem 

The current impasse among the academics is their failure 
to reach an agreement in connection with the reliability 
and validity of SET in assessing teaching quality and student 
learning in universities (Agbetsiafa, 2010; Beran & Rokosh, 
2009; Clayson, 2009; Dorasamy & Balkaran, 2013; Otani et 
at, 2012). Beran and Rokosh (2009) and Madden et al. 
(2010) hardly found any evidence of the connection 
between SET and teaching effectiveness observed in this 
inherent variation. However, some positive relations have 
been perceived between SET and teaching effectiveness. 
Because of this essential disparity among the academics, it 
is arguable that SET alone cannot adequately endorse 
instructional excellence and student learning across 
universities, notwithstanding its popularity. In reality, Beran 
and Rokosh (2009) who noted that, instructors did not 
consider SET to be a perfect model for teaching 
effectiveness inherently perceived this as well. This may be 
true concerning class size, and a SET that is obtainable from 
small classes is probable unreliable (Ibrahim, 2011), Thus, 
the research problem is to explore whether the 
combination of SET and GSS together can produce robust 
reliability and validity of teaching quality, and student¬ 
learning outcomes than the conflicting results derived from 
SET alone. The knowledge realized will help perfect the 
current flawed information base concerning coalescing 
SETs with other evaluation designs for assessing teaching 
effectiveness (Berk, 2005; Marsh, Ginns, Morin, Nagengast, 
& Martin, 2011). It will also produce a new perception, 
generate robust validity and consistency, and expand on 
the existing methods of evaluating teaching effectiveness 
and student outcome (El Hassan, 2009; Kozub, 2010; 
Skowronek, Friesen, & Masonjones, 2011). Without this 
research, it is probable for the significance of the GSS as a 
supplementary teaching assessment design may remain 
indistinct. 


7. Purpose of the ESET Tool 

The purpose of the ESET is to provide a statistically valid 
student rating instrument that can be used to determine 
instructional efficacy. The ESET is completed by currently 
enrolled students every academic semester at a HBCU via 
online surveys with set questions on a 5-point Likert scale. 
The instrumentation is independently administered near 
the close of every semester by the 'Research, Evaluation, 
and Planning' division of the HBCU. The tool is electronically 
released and independently administered to all students at 
the university every semester, In this manner, every 
semester courses and faculty are evaluated. Thus the 
university administration has the students evaluate faculty 
on teaching effectiveness and overall engagement that 
leads to student learning. 

8. Research Questions 

The research questions (Osier and Mansaray, 2013a) listed 
below were developed to examine the validity and 
reliability of the Student Ratings of Instruction [SRI/ESET] 
instrument used in the study to evaluate teaching quality. 

Q 1: Do ratings completed by students engender internal 
reliability [consistency] in their measurement of teaching 
effectiveness? 

This question calls for a quantitative research design. The 
ratings from the survey data at HBCU were used to 
determine the reliability of the question. This is specified in 
hypothesis H 10 and H la , The statistical tool that will be realistic 
is the Cronbach's Alpha to verify the reliability [consistency] 
of the instrument used to evaluate teaching effectiveness. 

Q2: Do ratings completed by student ratings produce 
augmented validity in measuring teaching effectiveness? 

The question calls for a quantitative research design. Again, 
the research made use of existing data at HBCU to 
determine this question. This is again outlined in hypothesis 
H 20 and H 2a . The study will apply the factor analysis to 
determine the validity [construct validity] of the instrument 
used to determine teaching effectiveness. 

9. Research Hypotheses 

The following hypotheses (Osier and Mansaray, 2013a) 
were used to assess the research questions Q1 and Q2. 
Each research question addresses a null hypothesis with 
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anticipation of a non-significant association, and an 
alternative hypothesis that suggests that a significant 
association does occur between the variables. 

H 10 : The student ratings do not increase the reliability of the 
instrument used to assess teaching effectiveness. 

H 1a : The student ratings significantly increase the reliability of 
the instrument used to assess teaching effectiveness. 

H 20 : The ratings completed by students do not create any 
validity of the rating instrument used in evaluating 
teaching effectiveness. 

H 2a : The ratings completed by students generate 
increased validity of the rating instrument used in 
evaluating teaching effectiveness. 

9 .1 Tri-Squared Test Mathematical Hypotheses 

The first sets of Mathematical Hypotheses used in the study 
in terms of Tri-Squared to determine SRI item efficacy, 
validity, and reliability were asfollows: 

H 10 : Tri 2 = 0 
H la : Tri 2 * 0 

9 .2 Cron bach's Alpha [a] Mathematical Hypotheses 

The second sets of Mathematical Hypotheses used in the 
study in terms of Cronbach's Alpha to determine reliability 
were as follows: 

H 10 :a<0; 

H la :a> 0. 

The second sets of Mathematical Hypotheses used in the 
study in terms of Cronbach's Alpha to determine validity 
were as follows: 

H 20 : a<0; 

H 2a :a>0. 

10. Procedure of Data Analysis: Statistical Models 
10.1 Tri-Squared Test [Tri 2 ] 

Tri-Squared comprehensively stands for "The Total 
Transformative Trichotomous-Squared Test" (or 
"Trichotomy-Squared"). The Total Transformative 
Trichotomous-Squared Test provides a methodology for 
the transformation of the outcomes from qualitative 
research into measurable quantitative values that are used 
to test the validity of hypotheses. It is based on the 


mathematical "Law of Trichotomy". The Total 
Transformative Trichotomous-Squared Test provides a 
methodology for the transformation of the outcomes from 
qualitative research into measurable quantitative values 
that are used to test the validity of hypotheses. The 
advantage of this research procedure is that, it is a 
comprehensive holistic testing methodology that is 
designed to be static way of holistically measuring 
categorical variables directly applicable to educational 
and social behavioral environments where the established 
methods of pure experimental designs are easily violated. 
The unchanging base of the Tri-Squared Test is the 3x3 
Table based on Trichotomous Categorical Variables and 
Trichotomous Outcome Variables. This emphasis of the 
three distinctive variables provide a thorough rigorous 
robustness to the test that yields enough outcomes to 
determine if differences truly exist in the environment in 
which the research takes place (Osier, 2013). The 
Tri-Squared research procedure uses an innovative series 
of mathematical formulae that do the following as a 
comprehensive whole: (1) Convert qualitative data into 
quantitative data: (2) Analyze inputted trichotomous 
qualitative outcomes; (3) Transform inputted trichotomous 
qualitative outcomes into outputted quantitative 
outcomes; and (4) Create a standalone distribution for the 
analysis possible outcomes and to establish an 
effective—research effect size and sample size with an 
associated alpha level to test the validity of an established 
research hypothesis. Osier (2012) defined Tri-Squared as: 

Tri '~ = T Sum f(lW x — Tri y )‘: Tri y 
10.2. Cronbach's Alpha [a] 


One of the significant statistical models in this research is 
the Cronbach's Alpha [a]. It is a valuable coefficient for 
examining the internal consistency and has been named 
after Lee Cronbach who first developed it in 1951. Bland 
and Altam (1997) defined Cronbach's Alpha as: 


a = 


k- 1 


1 - 


y k s z 

"i=i 


where, k is the amount of objects s 2 is the variance of the t 
object and s T 2 is the variance of the final total created by 
adding all the objects. In addition, they also said if the 
objects were not simply added to make the total, but were 
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initially multiplied by weighting coefficients, then the object 
must be multiplied by its coefficient ahead of the analysis 
of the variance s^ Certainly, the formula must contain at 
least two objects, that is, k>l or a cannot be distinct. Field 
(2009) also defined the Cronbach's Alpha somewhat 
differently from that stated by Bland et al. (1997), even 
when the ideas are similar. Field defined the Cronbach's 
Alpha as: 

N 2 Cov 

y -2 

*-‘ c, item+ lCov ltem 

The author noted that, for every object on the scale, two 
things can be computed: the variance contained in the 
object, and the covariance amongst an explicit object 
and any additional object on the scale. Thus, a 
variance-covariance matrix of the whole objects can be 
computed. In addition, the author also said, in the matrix, 
the diagonal rudiments establish the variance contained in 
an exact object, and the off-diagonal rudiments comprise 
covariances amid sets of objects, The upper half of the 
formula is the quantity of objects (N) squared multiplied by 
the mean covariance amongst objects. The lower half is 
only the total of all the object variances and object 
covariances. The arrays of the alpha statistic are between 
zero and one. Greater the coefficient, better the select 
items organized together in evaluating the instrument 
construct, and thus the better the statistical reliability of the 
assessment tool. An alpha of 1.00 would imply a seamlessly 
consistent instrument, while a coefficient of zero would 
imply an untrustworthy tool (Oslerand Mansaray, 2013). 

10.3 Factorial Analysis 

The factorial model used in this study is derived from 
Agbetsiafa (2010), and Field (2009). Concisely, factor 
analysis allows the delineation of an essential or hidden 
configuration in a data set, It accelerates the analysis of the 
configuration of the associations (correlation) among an 
outsized number of variables by describing a set of shared 
essential measurements, commonly termed factors 
(Agbetsiafa, 2010). Field (2009) noted that, factorial is a 
mathematical model, resembling a linear equation but 
without the intercept because the lines intersect at zero 
and, therefore, the intercept is also zero. Field (2009) 
defined factorial as: 


Y,= b,X ni + b 2 X 2i + ... + b n X ni + e, [The values of b are the 
loading factors]. 

Agbetsiafa's (2010) was more detailed in his description of 
the factorial model in his research than Field (2009). 
According to Agbetsiafa, it is conceivable to reorient the 
data to allow the first small number of measurements to 
explain for much of the existing data. Assuming there is any 
idling in the data set: it is also conceivable to explain for 
most of the evidence in the original data with a significantly 
condensed amount of measuremenfs. Adapting his 
template, this study also assumes that the 15 items on the 
student evaluation survey instrument bears relationships 
with a series of functions working linearly, and they may be 
represented by the following mathematical formulas: 

Y, = a ]0 + a^X, + ... + a ln X n + e, 

Y 2 = a 20 + a 21 X, + ... + a 2n X n + e, 

V 3 = a 30 + a^X, + ... + a 3n X n + £,. 

Y1 3 — a 150 + G^X, + ... + o 16n X n + £ 

where, Y = a variable with recognized data; a = constant; 
X; = the fundamental factors; and, e,= the error terms, 
which help to point out the conjectured associations, are 
not exhaustive. Thus, applying the technique to the 
recognized 15 items on the student rating survey 
instrument, factor analysis describes the unidentified X 
utilities. The developing loadings from the analysis are the 
constants, and the factors are the X utilities. The scope of 
the individual loading for every utility assesses the degree to 
which the definite utility is associated with the explicit 
variable (Y). Thus, for any of the 15 variables in equation one 
of the proposed study, the model may be written as: Y1 = 
a,X, + a 2 X 2 + a 3 X 3 + ... + a n X n + e , where X is denote factors, 
and a is signify the loadings (Oslerand Mansaray, 2013a). 

11. ESET Research at an HBCU 

The backdrop for this research study is the Research, 
Evaluation, and Planning Department (REP) of FHistoric Black 
College and University. The study will make use of the 
student rating responses conducted during the spring 
semester of 2012 by REP This is a cross-sectional data set. 
Arguably, REP has the responsibility to coordinate all 
student, faculty and administrative surveys on behalf of the 
university. 7,919 students responded to the spring semester 


i-manager's Journal on School Educational Technology, Vol. 11 • No. 2 • September - November 2015 


35 







RESEARCH PAPER 


survey. The students were graduates and undergraduates 
and were from the various departments, schools and 
colleges within the university, with the exclusion of the Law 
School. The ratings from the Law School have different 
components, which are incompatible with the ratings of 
the rest of the schools and colleges in the University (Osier 
and Mansaray, 2013b). 

11.1 The SRI Measurement Scale 

The student ratings of instruction survey is employed to 
evaluate course instructors and is administered online 
during the spring and fall semesters of each academic 
year, with the CourEval assessment tool in a 5-point Likert 
scale, to all registered students of the University. The rating 
survey requires students to assess their instructors on 15 
items in the assessment tool. The instrument has two 
subscales. Items 1 to 3 measure the student's efforts in the 
course, where the scale comprises the following: 1 = never, 
2 = not much of the time, 3 = about half of the time, 4 = 
most of the time and, 5 = all of the time. Items 4 to 15 
evaluate the instructor, where the scale comprisesthe 
following: 1 = strongly disagree, 2 = disagree, 3 = no 
opinion, 4 = agree and, 5 = strongly agree. This research 
considers the evaluation of the instructor as an assessment 
of teaching effectiveness or teaching quality. Also, the 
instrument has a section where students can make 
open-ended statements about the instructors, when these 
are requested by the individual colleges or departments 
within the University. For the 2012 spring semester ratings 
survey the 7,919 responders evaluated instructors on 
18,81 7 courses and course sections offered at HBCU. In 
addition, the only variables included in this study are the 15 
items on the survey instrument with theirrespective ratings. 

12. Research Methodology 

The choice to use the qualitative into quantitative mixed 
research methodology is due to its alignment with the 
problem and purpose statement of the planned study. The 
study assessed whether student ratings and senior ratings, 
together can produce robust validity and reliability in 
assessing teaching efficiency and student learning. The 
designs for these analyses are the SET and GSS ratings. 
Evidently, these are student surveys, which are applicable 
in assessing faculty, and are administered online each 


semester by the selected university for the projected 
research. The surveys are on a 5-point Likert scale, which 
definitely afford them a quantitative position. Arguably, 
quantitative research hypothesizes the comprehension of 
the features of a perceived event that can ordinarily be 
measured, or the evaluation of a probable association 
between two essentials (Cozby, 2012; Leedy & Ormrod, 
2010). In other words, qualitative into quantitative mixed 
research methodology is a recognized, impartial, 
methodical procedure in which statistical data are used to 
obtain data on a phenomenon. Surely, the procedure is 
employed to classify variables and assess the association 
between variables. It is against this background that this 
approach is the most applicable research method for the 
planned dissertation because the study will only apply 
numerical data in all its analyses. 

13. ESET Statistical Tools used to Analyze Data and Report 
Results 

13.1 Post Hoc Tri-Squared Test Analysis 

The application of the Cronbach's Alpha Reliability Model 
on three factors as qualitative outcomes was used to 
determine the ESET SRI Efficacy using the Tri-Squared Test 
statistic. 

Data Analyzed using the Trichotomous-Squared Test 
Standard a Three by Three Table is designed to analyze the 
research questions and data extracted from an Inventive 
Investigative Instrument designed with the following 
Trichotomous Categorical Variables: a, = Is the Student 
Rating of Instruction Instrument effective?; a 2 = Is the 
Student Rating of Instruction Instrument valid?; and a 3 = Is 
the Student Rating of Instruction Instrument consistent? The 
3x3 Table has the following Trichotomous Outcome 
Variables: b, = Yes; b 2 = No; and b 3 = No Opinion. The 
Inputted Qualitative Outcomes are reported as shown in 
Figure 1 (for 56451 e 7919 |GrandTotalSRI] ) (all results are from 

Osier and Mansaray 2013a). 

The Tri-Square Test Formula for the Transformation of 
Trichotomous Qualitative Outcomes into Trichotomous 
Quantitative Outcomes to Determine the Validity of the 
Research Flypothesis: 
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n m ~ 56451 [ Total [tems ] 


TRICHOTOMOUS 

a = 0.001 TESTING 

INPUT VARIABLES 

TRICHOTOMOUS 

«i 

«2 

«3 

RESULTS *i 

15629 

15751 

15125 

OUTPUT * 2 

1222 

1058 

1406 

VARIABLES 




^3 

1966 

2008 

2286 


TrPd-f = [C~ 1][« - 1] = [3 - 1][3 - 1] = 4 = Tri 

Figure 1. Post Hoc Tri-Squared Test 



Tri 2 Critical Value Table = 18.467 (with d.f. = 4 at a = 0.001). 
For d.f. = 4, the Critical Value for p >0.001 is 18.467. The 
Calculated Tri-Square value is 92.531, thus, the null 
hypothesis (H 0 ) is rejected by virtue of the hypothesis test 
which yields the following: Tri-Squared Critical Value of 
18.467<92.531 Calculated Tri-Squared Value. Results: (1) 
Tri-Squared Calculated Value = 92.531; (2) Tri-Squared 
Degrees of Freedom = 4; (3) Tri-Squared Probability = 
0.0096; 4) Tri-Squared Alpha Level = 0.001 [for n ffl = 
56451 [Total Items] Maximized Test Critical Value]. 

13.2 Tri-Squared Percentage Deviations 

Percentage deviation and standardized residual are both 
measures of the degree to which an observed Tri-Squared 
cell frequency differs from the value that would be 
expected on the basis of the null hypothesis. Figure 2 shows 
the Tri-Squared Percentage Deviations. 

13.3 Tri-Squared Standardized Residuals 

The standardized residual for a cell in a Tri-Squared table is 
a version of the standard normal deviate, "z Tri ", calculated 
as follows 

Tri x - Tri y 

^Tri ~ ——— 



«i 

a i 

«3 

b , 

+1.02 

+2.00 

-3.03 

b 2 

-0.18 

—4.85 

+5.06 

b 3 

-2.64 

-1.72 

+4.36 


Figure 3. Tri-Squared Standardized Residuals 


Where, z Tri = The Tri-Squared Calculated Standard Normal 
Deviate;Tri x = Trichotomous Qualitative Outcomes; andT riy = 
Trichotomous Quantitative Outcomes. Assuming the null 
hypothesis to be true, values of the standardized residual 
belong to a normally distributed sampling distribution with a 
mean of zero and a standard deviation of ±1.0. Figure 3 
shows the Tri-Squared Standardized Residuals. 

13.4 Goodman & Kruskal's Lambda (X) Tri-Squared Results 

Goodman & Kruskal's Lambda (A) is a cross tabulation 
analysis measure of proportional reduction in error. 
Lambda indicates the extent to which the modal 
categories and frequencies for each value of the 
independent variable differ from the overall modal 
category and frequency. The Goodman-Kruskal Values for 
Lambda range from zero (indicating that there is "no 
association" between independent and dependent 
variables) to one (indicating a "perfect association" 
between independent and dependent variables). It is 
calculated with the following equation: A=^-^ where,£,= 
is the overall non-modal frequency; and z 2 = is the sum of 
the non-modal frequencies for each value of the 
independent variable. Table 1 shows the Goodman & 
Kruskal's Lambda (I) Tri-Squared Results. 



«i 

«2 

«3 

h 

+0.8% 

+ 1.6% 

-2.4% 

b 2 

-0.5% 

-13.8% 

+ 14.4% 

b 3 

-5.8% 

-3.8% 

+9.6% 


Cross Tabulation 
of Variables 


Independent Variables Results: 

Categorical Categorical Categorical _ 
Variablel =a, Variable 2=a 2 Variable 3=a 3 



Outcome 
Variable 1 =b, 

15629 

15751 

15125 

46505 

Dependent 

Variables 

Outcome 
Variable 2 =b 2 

1222 

1058 

1406 

3686 


Outcome 
Variable 3 =b 2 

1966 

2008 

2286 

6260 

Results: 


18817 

18817 

18817 

56451 


Figure 2. Tri-Squared Percentage Deviations 


Tablel. Goodman & Kruskal's Lambda (X) Tri-Squared Table 
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14. Summaryand Recommendations 

The research employed the responses from the ESET ratings 
to assess instructional (teaching) effectiveness and student 
success (its inherent outcome). Students were requested to 
rate the numerical variables on the ESET survey on a 5-point 
Likert scale. Exploratory data analysis as a means of 
measurement is significant for the numerical identified 
independent variables which were used to determine the 
level at the research hypotheses. In addition, the 
Cronbach's Alpha statistic was used to measure and assess 
the internal consistency (reliability) of the ESET instruments 
employed to students to evaluate faculty teaching 
efficacy. In addition, Factorial Analysis is another 
applicable statistic used in the study, to determine the 
construct validity of the ESET instrumentation. The authors 
make the following recommendations regarding the use of 
ESET as a novel universal SRI (Student Ratings of Instruction) 
tool: 

(1) The ESET and associated measurement procedures be 
officially support by the institution of its origin as a base 
operating procedure for the continuous improvement of 
ongoing collegiate teaching and learning. 

(2) The ESET structure needs to be commercialized and 
branded as supportive research regarding magnitude 
clearly illustrates that the ESET has great value in regards to 
learning measurement and assessment. 

(3) The ESET model be adopted by more institutions so that it 
becomes an "industry standard" at more similar or like 
institutions of higher learning, 

Conclusion 

Student ratings are immensely common to evaluate 
teaching quality in many Universities worldwide. There are, 
however, some disagreements relating to their validity and 
reliability as a measurement of teaching effectiveness. The 
ESET design offers a statistically quantifiable methodology 
that can be readily used by any institution seeking to 
effectively evaluate its faculty teaching efficacy. This novel 
instrumentation is also applicable for faculty assessment in 
the self-assessmentof learner performance. When the ESET 
instrument is effectively employed in conjunction with its 
multiplicative statistical methodology; the joint analytical 
designs reveal the actively robust validity and reliability of 


the ESET instrument. The ESET (as a measurement of 
teaching effectiveness and student learning) is an 
exceedingly usable arithmetical implement. Information 
on data gathering, suppositions, hypotheses, and the 
nature of the research methodologies and statistical 
models were revealed in this study. The use of the ESET in 
future research will generate new ideas on the uses of 
instructional effectiveness metrics to determine curricular 
outcomes for formative program evaluation (as 
"curriculum development") and summative evaluation for 
faculty assessment purposes (expressed as: tenure, faculty 
merit pay increases, and the retention of instructors for vital 
non-tenured [adjunct] faculty positions). 
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